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ABSTRACT 

Educational research on the factors of studont 
achievement has been limited by its failure to consider the 
multilevel or hierarchical nature of most data. This study used a 
nonexperimental regression-based procedure, hierarchical linear 
modeling (HLM) , to empirically develop a predictive model of 
f if*-.h-grade achievement in reading and mathematics for a statewide 
data set at both the individual student and school district levels. 
Tab database was comprised of reading and mathematics achievement 
test scores of 86,227 elementary, students in Pennsylvania who were 
enrolled in third grade in 1985 and in fifth grade in 1988. Findings 
indicate that only a small portion of the variability in individual 
achievement is potentially explainable by district-level factors. HLM 
was also used to identify district-level factors that expl.iin the 
variation in district mean achievement and within-district 
relationships. For example, a small effect of class size was revealed 
ir. increasing the within-district relationship between prior ability 
and student achievement. These results permit the formulation of a 
wider range of policy inferences than is possible with conventional 
regression analyses, one figure and six tables are included. (27 
references) (LMI) 
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Abstract 



A continuing concern of educational researchers has tseen determining the factors that 
contribute to promoting student achievement. Those research efforts that were primarily 
focused at the district or school level have t>een remiss in talking into account the multilevel or 
hierarchical nature of most educational data. For educational research to have policy 
relevance, the methodological mismatch t>etween thi> multilevel nature of educational data and 
the use of linear unilevel data-analytic models needs to \oe resolved. 

This study used a nonexperimental regression-based procedure, hierarchical linear 
modeling (MLM), to empirically develop a predictive model of fifth-grade achievement in 
reading and mathematics for a large state-wide data set at both the individual student and 
school district level. The results showed tt'iat only a small portion of the variability in individual 
student achievement is potentially explainable by district-level factors. In addition, HLM was 
used to identify district-level factors which explain not only the variation in district mean 
achievement, but within-district structural relationships as well. The HLM analysis revealed, for 
example, a small effect of class size in increasing the within-district relationship between prior 
ability and student achievement. These results permit a wider range of policy inferences to be 
made than would be possible with conventional regression procedures. 
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Introduction 

I he report '^.quality of Educational Opportunity (Coleman et al., 1966) questioned the 
effect of schooling on student achievemetit and spawned numerous studies (see Glasman & 
Biniaminov, 1981; Madaus. Airasian & Kellaghan, 1980; and Mosteller & f^oynihan, 1972 for 
examples) aimed at investigating the relationship between a set of input variables and student 
achievement within a given school system. The studies comprising the "school effects" 
literature have often arrived at conflicting conclusions due to the use of divergent sampling 
procedures, different units of analysis, disparate data analysis techniques, and varying 
operational definitions of the input variables and/or the outcome measures 

OfiP of the most glaring problems has been the failure to take into account the multilevel 
nature of the data inherent in most educational settings. In other words, variables of interest 
are often observed and measured at different levels of analysis, e.g., individual students and 
school districts. For the field of educational research, this has resulted in the absence of a 
consensus as to what factors promote student achievement. Moreover, from an educational 
policy perspective, this lack of consensus has led to inconsistent and counterproductive 
applications of empirical inquiry revolving around a number of policy questions such as school 
district consolidation or equity in school financing (see Geske, 1983; Guthrie, 1979). 

Recent studies (e.g.. Bid /ell & Kasarda. 1975; Friedkin & Necochea, 1988; Turner. 
Camiili, Kroc & Hoover, 1986; V/alberg & Fowler. 1987) have investigated the relationship 
between so-called input variables, notably socioeconomic status, district size and expenditures, 
with student achievement at the district level within a given state. The results of these studies 
suggest that the role of input variables in explaining variation in student achievement is largely 
a function of the level of data aggregation (see Blalock, 1964; Hannan. 1971; Langbein. 1977 
for further clarification). 

Thus, proper specification of i predictive model of student achievement entails not only 
the inclusion of relevant predictor variables related to achievement (Cooley, 1978), but an 
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incoiporation of the multilevel nature of the data as well. In other words, developing an 

adequate predictive model of Siudent achievement requires that the multilevel nature of a state 

educational system be taken into account by indicating how variables at one level of the 

system (e.g., district) might Interact with variables at another level (e.g., student). 

For educational research to have policy relevance the methodological mismatch between 

the hierarchical (i.e., nested/multilevel) nature of most educational data and the use of linear 

unilevel data-analytic models needs to be resolved. Previous studies of this scope and 

magnitude have ignored the multilevel nature of educational data, and consequently have drawn 

causal inferences which have obscured rather than clarified specific policy issues. Cronbach 

(1976) was one of the first investigators of the multilevel phenomenon to put it bluntly: 

The majority of studies of educational effects - whether classroom experiments, 
or evaluations of programs, or surveys have collected and analyzed data in 
ways that conceal more than they reveal. The established methods have 
generated false conclusions in many studies (p. 1). 

Failure to take the multilevel nature of the data into account can lead to incomplete or 

incorrect empirical inferences with undesirable consequences ^or informing siate-level policy 

questions. In the past, the use of available aggregate-level data has been justified on the 

grounds that collecting individual-leve! information was too costly (Langbein, 1977). But as 

Aitkin and Longford (1986) ask: 

If the analysis of available aggregate data leads to wrong conclusions and 
disastrous educational policies, where is the economic or educational benefit to 
be found? (p. 42). 

Questions of research design also have consequences regarding the issue of multilevel 
data. In investigating the impact of particular variables within a policy context, e.g., the effect 
of school district size or educational expenditures on student achievement, it is unfeasible and 
unrealistic to utilize expenmental design methodology requiring random assignment. For 
example, students cannot be randomly assigned to school districts of varying size or rate of 
educational expenditures. Thus, educational evaluation or decision-oriented research efforts 
often must rely on naturally occurring data to investigate the role of a group of input variables 



f 



in explaining variation in student arl. .ement (Cooley. 1978; Longford. 1989). Unfortunately, 
the multilevel nature of educational data is not adequately addressed with conventional 
methodologies (Burstein, 1980; Cooley, Bond & Mao. 1981; Cronbach, 1976). 

Purposes of Research 

This paper is based on a broader study (Bernstein, 1990), which has two overlapping 
purposes. One is to develop and test a data-based predictive model which identifies the 
relevant variables associated with student achievement from a multilevel perspective, i.e., 
students within districts. The model is developed from an extant data base using conventional 
ordinary least squares regression techniques and the regression-based hierarchical linear 
modeling procedure, HLM (Bryk et al., 1988). The second purpose is to examine the 
differences in which parameters are estimated in a multilevel analysis using HLM in comparison 
io conventional analyses conducted at a single level of measurement (individual student or 
school district). 

This paper addresses In part these purposes by discussing the limitations of using 
linear uniievel regression procedures in modeling student achievement. An introduction to the 
hierarchical linear modeling procedure is also provided as an illustration of the potential 
application of multilevel analysis in providing information to policy makers at the state level. 

Data Base 

The data base for this study is comprised of the cohort of Pennsylvania elementary 
school pupils who were enrolled in third grade in 1986 and in fifth grade in 1988, and who 
participated in the statewide Tests of Essential Learning Skills (TELS) program in reading and 
mathematics. The analyses described in this paper are based on a total population of 86,227 
students enrolled in 1794 elementary schools (in 1988) within a total of 496 schoot districts 
throughout the state of Penns /ivania. The outcome measure for this study is a composite 
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academic achievement score (ACH88) comprised of a weighted average of the fifth grade 
reading and mathematics test scores. In one sense, the tasts are considered to be criterion- 
referenced measures since they indicate whether students have met Gtate-mandated minimum 
competency levels in basic learning skilis in reading and mathematics In this capacity, the 
tests empioy cut-off scores to identify students in need of remedial instruction. 

In addition to the outcome measure, several other variables are measured at the student 
level. One key variable is a measure of prior student ability, a composite achievement score 
(ACIHSS) based on penormance on the third grade TELS reading and math measures. Other 
student background variables that are obtained are SEX (1smaie/2sfemale), RACE 
(1snonwhite/2=white), special education status (SEC88), compensatory education participation 
in reading and math (CEPRM88), and a home environment-motivation index (HOMEMOTV). 

Variables at the district level are grouped into several categories. Three district-level 
proxies for socioeconomic status are included: a measure of a market value/personal income 
aid ratio for 1988 (AiDRATIO); the percentage of families receiving federally-sponsored 
assistance for dependent children (AFDCPCT) and an aggregated mean of the home 
environment/motivation index (AVGHOMTV). In addition, a measure of average daily 
attendance divided by enrollment (ATTEND) is included to tap the level of motivation in a 
school district. 

School district size is represented by an enrollment variable (LNADM) measuring 
average d.iiiy membership over the school year 1 987-88. A logarithmic transformation is used 
to correct for noniinearity of district size effect estimates due to the presence of a few 
extremely large school districts. An additional size variable (POPMILE) measures population 
density per square mile to reflect an urban vs. rural distinction among school districts. 

Expenditures for schooling is measured by a number of 1987-1988 fiscal indicators. 
Instructional cost per pupil (EXPEND) is derivec' by dividing the net instructional expenditures 
of a district by average daily membership. Variables related to expenditures for schooling at 



the district level are represented by average district class size (CLSI2), mean elementary 
school teacher salary in the district (TCHSAL), and ratio of the number of students to the 
number of district administrative personnel (ADMIN). In addition, variables measuring teacher 
experience (TCHSERV) and level of training (TCHLEVEL) are included. 

Finally, a number of aggregated measures are included to examine the degree of 
contextual effects. Contextual effects are defined here as the influence of peer group 
composition on student achievement. PINSE88 measures the effect of tho percentage of 
special education students in the district. PCTNW is the percentage of nonwhite (black and 
Hispanic) students in the district. DCEPRM88 is an aggregated measure of compensatory 
educational status at the individual student level. An additional contextual effect (DCUTTELS) 
measures the aggregated mean score of students falling below the cutoff on either reading or 
mathematics in a school district. 

Data Analytic Procedures 

The empirical development of a predictive model of student achievement was based 
on a series of individual student and district-level ordinary least squares regression analyses, 
in the district-level analysis the outcome measure (DACH88) was created as an aggregated 
mean of district-level achievement and regressed on a set of district-level indicators and 
aggregate measures of the student background variables. An individual level regression 
analysis was conducted with the composite achievement score (ACH88) and the student-level 
predictor variables. 

Informed by the above results, a set of analyses were then performed using the 
hierarchical linear modeling program, HLM. In its simplest form, a hierarchical linear model 
is comprised of two equations describing the structural relationships in a within-and a between- 
group model (see Raudenbush and Bryk, 1986, 1988). The regression coefficients estimated 
in the within-group model, (for k predictor variables in j groups) constitute the 



"microparameters" in the HLM formulation, which are then modeled as the outcome variables 
in the between-group model. At this level in the hierarchy the fi^ regression coefficients are 
postulated to vary across groups as a function of "macroparameters", Q^, which represent the 
systematic effects of p group>level variables on the k within-group relationships. Of l<ey interest 
in HLM is the capturing of the variation of the structural relationships across groups through 
the creation of both a within- and a between-group model. HLM uses information from both 
levels of the nicdel (e.g., individual students and school districts) and therefore does not force 
researchers to choose between the two units of analysis. 

Results of District- and Individual-Level Analyses 

in the district level analysis, an initial set of 16 predictors with DACH88, average fifth 
grade achievement, as the outcome variable was entered into a multiple regression analysis 
for the purpose of determining the relative importance of the input variables.^ In order to 
reach a more parsimonious, regression coefficients with t-ratios less than 1.5 were deleted 
until a model was reached with five pradictors: DACH86, AVGHOMTV, AFDCPCT. LNADM. 
and CLSIZ. Table 1 presents the results from the district- level analysis. The zero-order 
Pearson correlation coefficient with DACH88 is represented by r, while B and Beta refer to 
the urstandardized and standardized regression coefficients respectively. 

In this model which accounted for 51 percent of the observed variance in achievement, 
DACH86 appeared to be the strongest predictor in terms of explaining variance in district 
achievement.^ AFDCPCT displayed a moderate negative relationship with achievement, with 



^The model-building strategy employed here is primarily of rn empirical nature to allow 
for comparisons with a multilevel analysis. The use of empirical procedures for variable 
selection in regression analysis has justifiably been criticized for its substantive short- 
sightedness (see Boyd & Iversen, 1979 for examples). 

^It should be Itept in mind, however, that regression coefficients are unreliable indicators 
Ot the strength of relationship between two variables. 
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Table 1 

Regression Analysis for Dlstrlct-Ueval Model 

Multiple R .719 

R Square .518 

Adjusted R Square .513 

Standard Error 2.536 



VARIABLE 


£ 


M 


BETA 


I 




DACH86 


.672 


.487 


.490 


12.331 


.000 


AVGHOMTV 


.513 


1.849 


.184 


4.537 


.000 


AFDCPCT 


-.458 


-.083 


-.175 


-4.727 


.000 


LNADM 


-.034 


-.414 


-.077 


-2.329 


.020 


CLSIZ 


-.102 


-.386 


-.109 


-3.407 


.001 



LNADM and CLSIZ more weakly related.^ AVGHOMTV exhibited a moderate positive 
relationship in accordance with its zero-order correlation with DACH88. 

What is the effect of deleting DACH86 from the model? The regression coefficients 
for LNADM, AVGHOMTV, CLSIZ and AFDCPCT ail increased in magnitude to varying degrees. 
Removing DACH86 from the model not only reduces the Adj. R' from .512 to .363, but affects 
the magnitude of the other parameters in the model as well. 

For purposes of mparison, a comparable set of analyses were conducted on the 
individual student level variables and achievement. A multiple regression analysis was run 
with ACH88 as the outcome variable and six predictor variables. ACH86, CEPR88. RACE, 
SEC88, SEX and HOMEMOTV. This model explained about 64 percent of the observed 
variance in achievement. ACH86 was the most powerful predictor as indicated by its beta 
weight. SEC88 and CEPR88 had negative regression coefficients, while RACE and 
HOMEMOTV were positively related to ACH88. There was little difference in achievement 
due to the effect of SEX. On the basis of these results. SEX was dropped from the analysis 
and the individual level model was refitted with the remaining five predictors. This final model 

^Both LNADM and CLSIZ demonstrated a "suppressor" effect in that their partial-order 
correlation coefficients are higher than their zero-order correlations with achievement. 
These variables were retained in the analysis, however, due to their policy relevance. 
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did not suffer any drop in the adjusted figure and the model estimates remained virtually 
identical. Table 2 presents the results for this ntodei with correlation coefficients (r) and 
unstandardized (8) and standardized (Beta) regression weights. 

As with the district level regression, an analysis was run without ACH86 to investigate 
the effect of deleting prior ability on model specification. An analysis with the remaining four 
predictor variables had an adjusted figure of .41 1 , a decrease of about one-third from the 
previous model, in this model, all the effects increase dramatically in size and remain in the 
same direction. 

Table 2 " 
Regression Analysis for Individual-Level Model 

Multiple R .797 

R Square .636 

Adjusted R Square .636 

Standard Error 8.987 



VARIABLE 


r 


B 


BETA 


T 


6 


ACH86 


.776 


.675 


.622 


230.687 


.000 


HOMEMOTV 


.349 


.875 


.082 


37.224 


.000 


RACE 


.342 


5.037 


.109 


49.883 


.000 


CEPRM88 


-.460 


-3.462 


-.122 


-52.014 


.000 


SEC88 


-.325 


-6.854 


-.095 


-42.774 


.000 



Analyses conducted on the individual level can account for variables measured at a 
higher level of aggregation by disaggregating the effects baci< to the individual level (see 
Summers & Wolfe, 1977 for an example). In this manner, contextual effects can be measured 
by including a constant value associated with the school district for each individual student 
record. A model including the contextual effects of PCTNW, AVGHOMTV, DCEPRM88, 
DCUTTELS, and PINSE88 increased the explained variation in achievement by about two 
percentage points to an adjusted R' of .655. Of more importance, however, is the interpretation 
of the regression coefficients in this model. In this new model with district level contextual 
effects predicting student achievement, the individual level effects remained the same, except 
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for RACE, which decreased by more than 50 percent. The coefficients for the contextuai 
effects, PCTNW. PiNSE88 and DCEPRM88 were all positive whereas their zero-order 
correiatlons with achievement were negative. AVGHOMTV, on the other hand, had a negative 
effect on achievement in this model as opposed to its positive zero-order correlation with 
ACH88. Only DCUTTELS had the expected effect: The higher the district mean score of 
students below the cutoff on reading and math, the lower an individual student's score with the 
effects of the other predir'ors in the model held constant. 

The explanation for this counter-intuitive pattern of results probably lies in the 
simultaneous modeling of a group of individual and group-level effects. Not only are the 
individual variables correlated with their group-level counterparts, but the group-level variables 
are also intercorreiated with each other/ Muiticoiiinearity within the context of simultaneously 
modeling individual and group-level effects creates a difficulty in interpreting the relative 
importance of the explanatory variables in the model (Boyd and iversen, 1979). 

The resolution of this problem lies in adopting a technique which allows for the explicit 
modeling of individual-level relationships as a function of group-level factors. The following 
section of the results is devoted to a review of the analyses employed with the hierarchical 
linear modeling (l-ILM) program. 

Results of Multilevel Analysis 

An HLM model was formulated for the purposes of analyzing district mean variability, 
in this analysis, information was initially provided as to how much variation in the outcome 
measure, ACH88, lay within and between districts. The model, equivalent to a one-way 
random effects analysis of variance with districts treated as a random factor, was posed as: 



"The correlations among the district-level contextual effect variables at the individual 
level are higher than their corresponding intercorreiations at the district level. 
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Yij = ^lj + e,j, (within-district) and 
ji] s e + U, (between-district) 

In this formulation, the within-district model states that the outcome variable Y,j (fifth 
grade academic achievement of student / in district y) varies around a district mean with 
independent errors e,, assumed to be distributed - U{0,<f) where & signifies, within-district 
variance. In turn, in the between-district mcdei, each district's mean \^ varies around a grand 
mean 0 with independent errors U, assumed to be distributed - N(0,i) with x signifying 
between-district variance. 

The ratio of t (estimated between-district variance) to t + 6^ (estimated between- and 
within-district variance) yields an intra-district correlation coefficient which expresses the 
estimated proportion of variance in the outcome measure between districts, in this study, t 
= 1 1.28 and & « 185.09. with p » .057 indicating that approximately six percent of the variance 
in Y is located between districts. 

The next question concerned whether district means varied significantly across districts, 
or Hq: X = 0, where x again represents the amount of between-district variation in terms of 
means. In a large-sample test of this hypothesis, the Chi-square test statistic was equal to 
19132.0 with 495 degrees of freedom. p<.001. indicating that the null hypothesis could be 
rejected and that districts did show bignificant variability in mean achievement. 

The final question centered around determining the contribution of district-level factors 
to explaining the variability in district mean achievement. Note that if x = 0. then district-level 
factors cannot explain variability in district means. In this between-district model, each district 
mean score is predicted by district factors such as district mean home environment/motivation 
(SES): 

Hj = So + 8,{mean SES), + Uoj, 

where 

Oo is equal to the grand mean of achievement. 
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6^ is equal to the effect of district mean SES on 



and 



Uq, is assumed to be distributed ~ N(0,t). where t now represents the residual parameter 



variance after controlling for district mean SES. 

The results showed a highly significant relationship between district SES as measured 
by home environment-motivation and district mean achievement (6^ » 5.12, t » 13.26, p<.001). 
The residual parameter variance, x, after accounting for the district SES factor is now reduced 
to 7.90 from 1 1.28. in other words, about 30 percent of the between-district variance in mean 
achievement Is accounted for by district home environment/motivation. Mean achievement, 
however, still varied significantly across districts (x' = 13013, df::494, p<.001). This indicates 
that more terms need to be added to the between-district model in order to account for 
additional variation in district mean achievement (i.e., the explanatory model is still 
misspecified). 

Table 3 displays the results for an HLM model using the variables from the OLS district- 
level analysis with 6 coefficient values, standard errors and t-ratios: 



Table 3 

HLM Results for District Mean Achievement 



Residual Parameter Variance 


4.33 




R Square 






.62 




VARIABLE 


e 


S.E. 


I 




DACH86 


.492 


.039 


12.730 


.000 


AVGHOIVITV 


1.726 


.306 


4.361 


.000 


AFDCPCT 


•085 


.017 


-4.995 


.000 


LNADM 


-.436 


.173 


-2.516 


.012 


CLSIZ 


-.361 


.111 


-3.266 


.001 



In this model after accounting for the effects of DACH86, AFDCPCT. AVGHOMTV, LNADf*^ 
and CLSiZ, the residual parameter variance is reduced to 4.33, a reduction of 62 percent over 
the unconditional model with no district effects included. The 6 effect for 0ACH86 indicates 
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a highly significant association with district mean achievement. The other four predictors, to 
a lesser extent, are also significantly reL id to explaining variation in district mean achievement 
as indicated by their d coefficients. In this model, however, residual parameter variance in 
mean achievement still varied significantly across districts {%* - 3352.8, df=:490, p<.001). 

The strong effect of DACI-186 on district mean achievement mitigates the effects of the 
other district-level predictors.' In an attempt to assess the potential effects of these variables, 
a model was formulated without DACH86. The results of this model are shown in Table 4. 

Table 4 

HLM Results for Model without Prior Ability 



Residual Parameter Variance 


6.31 




R Square 






.44 




VARIABLE 


e 


S.E. 


T 


9. 


AVGHOMTV 


4.051 


.406 


9.983 


.000 


AFDCPCT 


-.145 


.019 


-7.643 


.000 


LNADM 


-.573 


.200 


-2.865 


.005 


CLSIZ 


-.530 


.127 


-4.170 


.000 



In this new model, the 6 coefficients for the four district-level variables increase 
considerably. The estimated residual parameter variance, x, for this model is 6.31 which 
represents a reduction of 44 percent over the unconditional model.^ Mean achievement, as 
in the previous case, still varied significantly across districts {%' = 6302.7, df=491, p<.0001}. 

The remaining set of HLM analyses focused on modeling the variability in within-district 
slopes as a function of district-level characteristics. In this manner, the distribution of academic 
achievement wac studied both within and across districts. The modeling strategy employed 



'For proper model specification, however, prior ability needs to be included in order to 
control for the nonrandom manner in which students are placed into schools and districts. 
Otherwise, the other variables tal<e on importance which may merely be attributable to initial 
student differences. 

'In contest, a between-district model containing only DACH86 resulted in a reduction of 
54 percent in terms of explaining parameter variance. 
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here built upon the previous analysis of district nfiean achievement. An unconditional model 
was proposed whereby district mean achievement (BASE) was modeled along with tha slop&d 
for ACH86 and HOMEMOTV: 

Yi, » 3)0 + 3ii(ACH86) + flp(HOMEMOTV) + OjaiCEPRMSS) 

in this model where % = BASE achievement, ACH86, HOMEMOTV and CEPRM88 
were centered around their respective district means and represent the "differentiating effect** 
of each variable within district in the unconditional model, the questions of interest centered 
around determining, on the one hand, whether there was a significant regression effect of the 
k student-level variables on academic achievement within districts, as well as the extent of 
variation of these effects across districts. In the unconditional between-district model, each 
OLS regression coefficient, % was in turn modeled as: 

Bj, = Ml. + for k = 0,1,2,3. (10) 
The \iy represent the fixed main effects (constant for each district) while the are the random 
effects which vary from district to district. These random effects represent the unique increment 
to the slope contributed by district J. The results for the unconditional model are displayed in 
Table 5. 

The 0 coefficients provide estimates of the mean fixed effects. Each of these mean 
fixed effects is statistically significant at the .05 level, indicated by the individual t-statistics 
testing the hypothesis (e.g., Hq: 9^ = 0) of wnether the average within-district coefficient = 0. 

The results for the random effects Up, indicate heterogeneity of regression for the four 
coefficients across districts. The BASE, ACH86. HOMEMOTV AND CEPRM88 effects all varied 
significantly across districts as indicated by the results of the Chi-square tests.^ For example, 
in the case of the ACH86 slope, a test of the null hypothesis Hq: lACHse = 0 yields a test statistic 



^Note that these Chi-square tests are conceptually equivalent to testing for homogeneity 
of regression in an ANCOVA model. The distinction here is that HLM permits an 
explanatory model to be posited which may account for the random variability in slopes 
across districts. 
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of 1903.1 which is then compared to a Chi-squar* critical value with 491 degrees of freedom. 
The result for HOMEMOTV (pB.04d), on the other hand, suggests that the residual parameter 



variance for this effect Is quite close to 0. 





Table 5 










HLM Results for Unconditional Model 






Fixed Effects 


e 


S.E. 


T 




BASE, (3o 


82.8529 


.1613 


513 732 


000 


ACH86, 13, 


.7087 


.0067 


10S S72 


000 

• www 


HOMEMOTV, 


.7962 


.0244 


32 642 


000 

• www 


CEPRM88, »3 


-3.7233 


.1347 


-27 649 


000 




Estimated Parameter 








Random Effects 


Variance 


df 


t 




BASE ACHIEVEMENT 


12.2228 


491 


47763.0 


.000 


ACH86 SLOPE 


.0156 


491 


1903.1 


.000 


HOMEMOTV SLOPE 


.0093 


491 


544.4 


.048 


CEPRM88 SLOPE 


5.2584 


491 


1464.3 


.000 


RELIABILITIES OF DISTRICT-LEVEL RANDOM EFFECTS 






BASE ACHIEVEMENT 


= .946 








ACH86 SLOPE 


» .639 








HOMEMOTV SLOPE 


= .019 








CEPRM88 SLOPE 


.493 









information about the reliability of the random effects in the model is also provided. 
These reliability indicators are derived from the ratio of estimated parameter variance in each 
regression coefficient, ^{^^ to the total observed variance in the estimated OLS slopes, cf[^^ 
+ o'Cl^jklBjJ. The estimate for BASE is highly reliable, .946. This indicates that almost all of 
the total observed variance in base achievement is potentially explainable by district-level 
factors. The regression coefficients are less reliable, ranging from a high of .039 for ACH86 
to a low of .019 for HOMEMOTV. In this latter instance, approximately 98 percent of the 



14 



ERIC 



observed variation in IHOMEMOTV is attributabie to sannpiing variance and not expiainabie by 
district-ievel characteristics.' 

The final step in the muitilevei analysis involved fitting a model to demonstrate the 
capability of the HLM program to model the effects of district-ievel covariate and policy 
manipuiable variables on within-district slopes. In this final model, a "sensitivity" analysis was 
conducted to determine the most economical set of covariates to accompany the policy 
variables of LNADM, EXPEND and CLSIZ. In the interests of parsimony and interpretability, 
only these three policy variables were modeled." In this model, the estimated residual 
parameter variance for HOMEMOTV was close to 0 indicating that the homogeneity hypothesis 
of residual variance for this parameter could be sustained and that any remaining variance 
could be attributed to sampling variability, it was thus decided to "fix" the residual variance in 
tiie HOMEMOTV slope to 0, whereby HOMEMOTV was treated as a fixed component with only 
an intercept term and no residual variation to explain. The results from the final explanatory 
model are presented in Table 6. 

In terms of district mean achievement, both LNADM and CLSIZ had negative 0 
coefficients, replicating the results from the ordinary least squares district-level analysis. 
EXPEND had a negligible negative effect on district achievement. In terms of the ACH86 
slope, CLSIZ had a small positive effect on the differentiating effect of prior ability on 
achievement. Within the HLM formulation of centering within-group variables around their 
group means, the intercept 3jo now represents group mean achievement. This choice of metric 
also allows for unambiguous statements to be made concerning the within-group coefficients 

^A potential explanation for the low reliability of the home environment-motivation effect 
is the lack of variation in HOMEMOTV between and within districts. Another possible 
explanation is coliinearity among the within-district slopes. In a model without ACH86, the 
parameter variance for HOMEMOT increased from .0093 to .5923 and the reliability 
indicator for the slope similarly rose from .019 to .423. 

'TCHLEVEL was also initially considered, but this variable was subsequently dropped 
from the analysis due to its possible suppressor effect (see results of OLS district-level 
analysis). 
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Table 6 








HLM Results for Explanatory Model of District-Level Effects 




Fixed Effects 


e 


S.E. 


T 


D 


FOR BASE. % 








.000 y 


BASE 


31.4679 


2.9567 


10.643 


DACH86 


.5737 


.0322 


17.835 


.000 


AVGHOMTV 


1.8172 


.4356 


4.172 


.000 


AFDCPCT 


-.0498 


.0146 


-3.415 


.001 


LNADM 


-.4658 


.1760 


-2.647 


.009 


EXPEND 


-.0002 


.0002 


-.889 


.374 


CLSIZ 


-.3666 


.1166 


-3.144 


.002 


FOR ACH86, B, 










BASE 


.3845 


.1306 


2.945 


.004 


AVGHOMTV 


.0546 


.0207 


2.633 


.009 


PCTNW 


-.0012 


.0005 


-2.190 


.028 


DCEPRM88 


-.0044 


.0006 


-7.147 


.000 


DCUTTELS 


.1929 


.0383 


5.042 


.000 


LNADM 


-.0100 


.0093 


-1.078 


.281 


EXPEND 


.00001 


.00001 


.785 


.433 


CLSIZ 


.0122 


.0060 


2.043 


.041 


FOR HOMEMOTV, Ba* 










BASE 


.8147 


.0232 


35.057 


.000 


FOR CEPRM88, % 










BASE 


-5.0274 


1.8118 


-2.775 


.006 


LNADM 


-.0928 


.1969 


-.471 


.637 


EXPEND 


.0006 


.0002 


2.916 


.004 


CLSIZ 


.0141 


.1377 


.102 


919 


* - The residual variance for this parameter has been set to zero. 








Estimated Parameter 








Random Effects 


Variance 


df 


t 




BASE ACHIEVEMENT 


5.5329 


485 


8138.0 


.000 


ACH86 SLOPE 


.0108 


484 


1773.0 


.000 


CEPRM88 SLOPE 


5.0019 


488 


1480.4 


.000 



and their relationship to the group mean on the outcome measure (Bryk et ai., 1988). 
Consequently, the differentiating effect of a within-district slope coefficient can be interpreted 
here as the degree to which differences in a within-district (e.g., prior ability) relate to 
differences in fifth-grade achievement. Larger class size, thus, tended to magnify the gap 
between students scoring low and high on third grade achievement. Conversely, smaller class 
size resulted in reducing this gap by preventing the low achievers from falling further behind 
in terms of achievement. Figure 1 depicts this relationship between the district-level effect of 
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class size and the ACH86 slope. The ACH86 slope modeled with CLSIZ is steeper than the 
unconditional ACH86 slope indicating that larger class size has an effect of increasing the gap 
between low and high students on third grade achievement. 

ACH88 
lOOr 

80 - 

60 - 

40 - 

20 - 

0 - 

0 20 40 60 80 too 

ACH86 

~ACHB6 Slope -H-ACHB6 Slope w/ CLSIZ 

Figure 1 

Effect of Class Size on ACH86 Slope 

LNADM and EXPEND, on the other hand, exhibited no significant effects on either 
decreasing or increasing the differentiating effect of ACi-186 on achievement. In addition, the 
policy variables LNADM and CLSIZ were negligibly related to explaining the variation in the 
CEPRM88 slope. EXPEND, however, did show a small positive effect in reducing the 
differentiating effect of compensatory education performance on achievement. . iie higher a 
district's level of instructional expenditures per pupil, the less of a gap between those students 
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receiving and not receiving remediai services in reading and math. This effect (6 = .0006) may 
be too minute, however, to have any impact in a practical significance sense. 

"The explanatory model accounted for 54.7% of the parameter variance in BASE, 30.8% 
in ACH86 and 4.9% in CEPRM88. The Chi-square statistics for the homogeneity of regression 
test for the three random terms in the model was significant indicating that this model was 
inadequate in terms of explaining the parameter variance ^among districts.'"' 

Summary 

Policy relevant variables, thus, did not make a major contribution in terms of explaining 
within-district variability. Proper model specification in this case entails finding additional or 
better measures of district-level indicators which could explain the differences in base 
achievement and slope variation across school districts. Use of a multilevel modeling 
procedure, such as l-ILM. allows one, however, to draw a broader range of policy inferences 
than one would under a conventional regression frameworl(. That is, one can identify those 
factors which explain not only variation in the outcome measure, but within-group structural 
relationships as well, such as within-district slopes. For example, (see Table 6), the effects of 
district size and expenditures can tal^e on different interpretations. In terms of district mean 
achievement, district size had a moderate negative effect while the effect of expenditures was 
negligible, l-lowever. district size had no effect on increasing the gap between students either 
scoring low and high on third grade achievement (ACI-186 slope) or those receiving and not 
receiving remedial services in reading and math (CEPRM88 slope). Thus, from one set of 
results, one could make a case against school district consolidation with the argument that 
district size has a negative effect on district achievement. By examining relationships within 



^^hese Chi-square statistics are only indications of the statistical fit of a model. Due to 
the presence of a few large school districts in this study producing very small standard 
errors, these statistics can become unduly inflated. Consequently, it may be preferable to 
examine the "substantive" fit of a model rather than rely strictly on statistical criteria. 
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districts, however, one could also argue for consolidation by showing that district size has a 
negligible effect on low achievers or those students receiving compensatory instruction falling 
further behind in achievement. To make informed policy decisions requires, therefore, as much 
data-based information as possible which can be brought to bear on an issue. 

Similarly, additional information is available for policy purposes from an HLM analysis 
with regard to the class size variable (CLSIZ). As shown in Table 6, class size has a negative 
effect on district mean achievement. The HLM analysis reveals, in addition, a small effect of 
class size in increasing the differentiating effect of prior ability (ACH86 slope) on achievement. 
This supplemental information could be used to inform the policy debate on decreasing class 
size. 

How does an HLM analysis employing multilevel data differ from conventional analyses 
conducted at the individual or district level? The results from Tables 1 and 3 show a striking 
similarity between the estimates obtained from the ordinary least squares regression district- 
level analysis and from the HLM analysis of district mean achievement. This is not surprising 
given the fact that in both instances mean district achievement is being predicted by the same 
set of variable?. The multilevel HLM analysis, however, accounted for a greater share of the 
explained variance in achievement (62 percent) than the OLS analysis at the district level (51 
percent). This is explain^jd by the fact that the hierarchical linear modeling procedure partitions 
the variance in district means into parameter variance and sampling variance as opposed to 
conventional regression analysis which does not make this distinction. Thus, in the HLM case, 
the set of five predictor variables explained 62 percent of the "true" differences in district 
achievement potentially explainable, which were not attributable to samplinn variability. 

This particular distinction between a multilevel modeling procedure such as HLM and 
ordinary least squares estimation procedure is critically important in terms of accurately 
measuring the extent of school or district effectiveness. Failure to properly partition the 
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variation between groups into that portion capable of explanation as opposed to noise has 
led to serious underestimation of school effects in the past (see Wiiims, 1984). 

The above results suggest various implications for policy research, particularly in terms 
of how policy maimers can potentially be misled by the results of inappropriate methodological 
models given the hierarchical nature of educational data. For one, the importance of correctly 
partitioning variability among schools or school districts has clear implications for educational 
policy research. As indicated in the results only a small portion of variability in student 
achievement is potentially explainable by group-level factors. Analyses employing district- 
level aggregate data, thus, are only capable of explaining a finite portion of the variation in 
achievement. District-level analyses explain district-level achievement.^^ The district-level 
model in this study with an R' of .51 is only explaining, in fact, 51 percent of the potentially 
explainable portion (in this case six percent) of the variation in student achievement. Adequate 
model specification, thus, depends on more than the inclusion of all relevant predictor variables 
related to the outcome measure. Failure to take into account the fact that students are 
grouped within schools and/or school districts, for example, can greatly obscure the findings 
of an analysis solely employing aggregate-level data. 

Can policy questions be informed by results of analyses conducted with individual-level 
data? The individual level OLS model, in contrast to the district-level OLS analysis, had a 
higher R' of .64 primarily due to the stronger relationship between third and fifth grade 
achievement at the student level. Individual level models, however, while explaining variation 
in student-level achievement, are incapable of modeling district-level factors (policy variables 
and contextual effects) without biasing other effects in the model. Simultaneously modeling 
variables measured at different levels of aggregation results in inadequate estimation of model 



^^Of that portion, it is crucial to identify those manipulable policy variables which have a 
potential impact on individual student achievement (Langbein, 1977). 
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parameters due to the pattern of intercorrelations betv;eer. the individual and district-ievei effects 

(see Aitkin & Longford, 1986; Boyd & Iversen, 1979, 'or further discussion of this issue). 

The purpose of the above comparisons was not to demonstrate the superiority of one 

ievei of analysis over the other, but rather to show that comparisons of analyses conducted at 

different levels of aggregation may not be meaningful. The question of which level of analysis 

to choose is clearly the wrong question to asK. The emphasis, instead, should be on 

developing a model to fit how the data is generated. 

The focus of an investigation of educational effects should be on the proper 
specification of the substantive analytical model(s) rather than on making a 
choice among competing units of analysis (Burstein, 1980, p. 161). 

Moreover, variables measured at the individual student level often represent different 
constructs from their school- or district-level counterparts (Cronbach, 1976). For example, the 
student-level indicator of SES in these data, HOMEMOTV, measures the propensity of an 
individual student to do well on achievement tests. The district-level aggregate, AVGI-IOMTV, 
even though it may have the same functional relationship with district-level achievement as 
HOMEMOTV does with individual-level achievement, takes on a different meaning in terms of 
its relationship with other district-level indicators such as per-pupil spending. The average 
district SES score may reflect more the fact that higher SES families enroll their children in 
better schools. It does not guarantee, however, that a student located in a high SES district 
will do well on achievement. Nor can a conclusion be drawn from an individual's high SES as 
to how that student's district will perform in terms of achievement. The relevant question to 
ask in this case is the effect of average SES composition of a school district on within-district 
variation in individual SES. 

These analyses have demonstrated, moreover, the importance of specifying prior ability 
in a model to reduce the bias due to student self-selection.^^ Removing prior ability from the 



^^Conditioning on prior ability reduces but cannot eliminate initial differences in 
academic achievement. 
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district-level model reduced the R' by 30 percent and affected the magnitude of the other 
parameters in the model. In the individual-level model, removing prior ability reduced the R' 
by 36 percent and caused the remaining effects to Increase dramatically in size. 

FurtherpFtbre, analyses employing proxies such as SES for incoming student ability are 
simply inadequate for addressing this bias. In the HLM model of district achievement, prior 
ability (DACH86) accounted for 80 percent more explanatory power than SES (AVGHOMTV) 
in predicting district mean achievement. The bias introduced into a predictive model when 
controlling solely for student SES and ignoring prior ability has been extensively documented 
in the literature (see Gray, 1989, for examples). 

As a recommendation for state data collection efforts, stronger and more reliable 
measures of SES are needed, both at the individual and district level. Indicators at the 
individual level based on student self-report data are subject to problems of unreliability 
endemic to questionnaire data. The student self-report data collected in this study are 
particularly prone to this problem considering the age level of the students. The extremely 
low reliability estimate for the HOMEMOTV slope in the HLM analysis (see Table 5) 
underscores this point. At the district level, a factor-weighted composite index would provide 
an improvement over the use of three disparate indicators for SES. States interested in 
assessing district effectiveness would be well-served by systematically coliscting this 
information. 

In short, the above discussion serves to emphasize the importance of proper model 
specification, both from the perspective of adequately controlling for student self -selection, and 
from capturing the educational processes underlying the data through the delineation of a 
multilevel analytical model. The statistical model proposed, however, should match as closely 
as possible the substantive model responsible for generating the data within a hierarchical 
context. In this manner, a multilevel analysis guided by estimation techniques employed in this 
study allows the educational researcher to not only avoid the problems of aggregation bias and 
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model misspecification, but to identify the factors responsibie for expiaining the variation in 
individual academic acfiievement. 

The finding in this study of only six percent of the variance in achievement lying 
between districts replicates a similar finding from the Coleman et ai., (1966) study, whereby 
the majority of variance was found within rather than between schools. Similarly, studies 
conducted by Gray (1989) found very small proportions of the variance in student achievement 
existing among British local education authorities. In this present study the lack (>r a strong 
grouping effect at the school district level would lead one instead to look at potential school- 
level factors to explain the variability in student achievement. Better and more complete 
estimates of student attributes to redress the problems of missing data and non-response, as 
well as the employment of longitudinal data to measure long-term educational effects, are other 
concrete examples of developing more adequately specified models of student achievement. 
The emphasis, In any case, should be on the development of simple, parsimonious models for 
ease of interpretation. 

It is not expected that this approach using hierarchical linear modeling can provide a 
conclusive conceptual answer to the question of correct model specification of student 
achievement. Due to the non-random manner in which students are grouped into schools 
and districts, model misspecification in terms of biased parameter estimates is likely to continue 
to be a problem, no matter how sophisticated the statistical methodology employed. Model 
misspecification and biased estimation can be reduced, however, with the development of 
stronger conceptual models. The sophistication of our conceptual modeling efforts has 
unfortunately lagged behind recent developments in statistical methodology. By specifying the 
process through which variables measured at different levels are related, hov/ever, multilevel 
analysis offers hope of an improvement over conventional regression analyses confounded by 
aggregation bias and model misspecification. 
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